August 17, 2017

Overview

Goal

  • Leave confident you can use ggplot

Agenda

  1. Intro and Motivation
  2. Grammar
  3. Toolkit
  4. Facets, Scales, Themes
  5. Full Example

ggplot and R

  • R is a language for statistical computing and data science

  • ggplot2 is an R package for data visualization

Motivation

Grammar

Fuel Economy Data

mpg 
manufacturer model displ hwy cyl class
audi a4 1.8 29 4 compact
audi a4 1.8 29 4 compact
dodge caravan 2wd 2.4 24 4 minivan
dodge caravan 2wd 3.0 24 6 minivan

Scatterplot Code

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()

Scatterplot Code

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()
  1. Plot function: ggplot()
  2. The data: mpg
  3. Aesthetic mapping: aes(...)
  4. A layer: geom_point()

Scatterplot Code

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()

Map class to color

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class))

Setting the value of color

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(color = "darkgreen")

Mapping Versus Setting

Mapping

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class))

Setting

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(color = "darkgreen")

Linear model fit

... + geom_smooth(method = "lm")

Mapping Color

Method1

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(method = "lm", se = FALSE)

Method2

ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
  geom_point() +
  geom_smooth(method = "lm")

One smooth per class

ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

One smooth

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(method = "lm", se = FALSE)

A layer consists of

  1. Data (mpg)
  2. Aesthetic Mapping (aes())
  3. Geometric Object (geom_)
  4. Statistical Transformation (stat = "count")

Statistical Transformation

ggplot(mpg, aes(x = class)) + 
  geom_bar(stat = "count")

Grammar Summary

  1. Plots based on data.frame

  2. Map data columns to aesthetic properties

  3. Build plots iteratively

Geom Toolkit

Boxplot

ggplot(mpg, aes(x = class, y = hwy)) + 
  geom_boxplot()

Adding Points

ggplot(mpg, aes(x = class, y = hwy)) + 
  geom_boxplot() + 
  geom_point()

Position Adjustments

ggplot(mpg, aes(x = class, y = hwy)) + 
  geom_boxplot() + 
  geom_jitter()

Histograms

ggplot(mpg, aes(x = hwy)) +
  geom_histogram()

Line Plot

ggplot(economics, aes(x = date, y = psavert)) +
  geom_line()

Area Plot

... + geom_area()

Ribbon

... +
  geom_ribbon(aes(ymin = psavert - 0.5, ymax = psavert + 0.5))

Heat Map: Old Faithful Eruptions

... + geom_raster(aes(fill = density))

Contour Plots

... + geom_contour(aes(z = density))

Help on Layers

help(geom_boxplot)

geom_boxplot understands the following aesthetics (required aesthetics are in bold):

  • x
  • ymin
  • ymax
  • alpha
  • colour

Faceting

Mpg v Displacement

ggplot(mpg, aes(x = displ, y = hwy)) +
  geom_point()

Facet by Cylinder

... + facet_wrap(~ cyl)

Free Scales

... + facet_wrap(~ cyl, scales = "free")

One Smooth Per Facet

... + facet_wrap(~ cyl, scales = "free") +
  geom_smooth(method = "lm")

Scales

Scales

A scale controls how the data appears on the plot

Displacement, Highway, Cylinders

ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point()

Name

... + scale_x_continuous("Engine Displacement")

Limits, Breaks

... + scale_y_continuous("Highway MPG", limits = c(0, 50), 
                         breaks = seq(0, 50, by = 5))

Color Scale

... + scale_color_hue("Cylinders", l = 80, c = 60)

Scale Toolbox

Position Color Shape
scale_x_continuous scale_color_gradient scale_shape_discrete
scale_x_discrete scale_color_discrete scale_shape_manual
scale_x_date scale_color_discrete
scale_x_datetime
scale_x_sqrt
scale_x_log10

Themes

Themes

A theme controls finer elements of plot appearance

x-axis labels

... + theme(axis.text.x = element_text(size = 16 ))

y-title rotation

... + theme(axis.title.y = element_text(angle = 0)

Sample Theme Elements

Plot Axis Legend Panel Faceting
plot.title axis.line legend.key panel.border strip.text
plot.margin axis.text legend.text aspect.ratio panel.margin

Complete Themes

... + theme_bw()

Bring it together

Motivating Plot

Personal Savings Rate

economics
date psavert
1967-07-01 12.5
1967-08-01 12.5
1967-09-01 11.7
1967-10-01 12.5
1967-11-01 12.5
1967-12-01 12.1

Line Plot

ggplot(economics) +
  geom_line(aes(x = date, y = psavert), color = "orange")

Uncertainty

... + geom_ribbon(aes(x = date, ymin = psavert - 1, ymax = psavert + 1))

Presidential Data

pres2
name start end party
Reagan 1981-01-20 1989-01-20 Republican
Bush 1989-01-20 1993-01-20 Republican
Clinton 1993-01-20 2001-01-20 Democratic
Bush 2001-01-20 2009-01-20 Republican
Obama 2009-01-20 2017-01-20 Democratic

US Presidents

... + geom_rect(aes(fill = party), data = pres2)

Color Scale

... + scale_fill_manual(values = c("blue", "red"))

Date Scale

... + scale_x_date(date_breaks = "4 years", date_labels = "%Y")

Separating Lines

... + geom_vline(aes(xintercept = start), data = pres2) 

Text

... + geom_text(aes(x = start, label = name), data = pres2) 

Labels

... + labs(x = NULL, y = "Rate", title = "Personal Savings Rate") 

Theme

... +
  theme_bw()

The Layers

ggplot(economics, data = economics) +
  geom_ribbon() +
  geom_line() +
  geom_rect(data = pres2) +
  scale_fill_manual() +
  scale_x_date() +
  geom_vline() +
  geom_text() +
  labs() +
  theme_bw()

ggplot2

  • package for data visualization

  • grammar of graphics

  • based on data

  • build layer by layer

  • rapid experimentation

  • fine tune control

Learning More